<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
  <title>Loading Data</title>
  <base href="../">
  <link type="text/css" href="styles.css" rel="stylesheet">
</head>

<body>

<hr align="left" size="2" width="550">
<h2><a name="LoadingData"></a>Loading Data into DataWarrior</h2>
<hr align="left" size="2" width="550">

<p>Apart from its own native file formats, <span class="keyword">DataWarrior</span> 
also reads and writes TAB-delimited and comma-separated text files as well as SD-files,
which are the de-facto industry standard for exchanging chemical information. In addition
to reading data from files, data may be pasted from the clipboard or retrieved from databases.
After reading data from any source, <span class="keyword">DataWarrior</span> analyses every column
to understand the kind of data it contains, i.e. whether it contains numerical and/or category data,
whether the row contains empty values, and more. It also checks for correlations and creates
default views and filters.</p>
<p>If <span class="keyword">DataWarrior</span> was installed correctly, then every file type
discussed in this section should have a proper icon assigned and double clicking a file's
icon should result in <span class="keyword">DataWarrior</span> opening the file.
This section explains the interaction with files and the clipboard.
</p>
<br>

<h3><i><a name="NativeFiles"></a><img class="icon" src="help/img/icons/dwar.jpeg" width="64" height="64" align="left" hspace="16">
Native DataWarrior Files</i></h3>
<p>
Whenever you save any data from <span class="keyword">DataWarrior</span> to open it later from the same
application, then a native <span class="keyword">DataWarrior</span> file ending with <b>.dwar</b> is the preferred file type.
In addition to the plain data, <b>.dwar</b> files may contain the following kind of information:
<li>Which views are visible, how they are arranged and what they display</li>
<li>Which filters are visible, how they are configured and, thus, which data rows are visible</li>
<li>Which row lists are defined and which rows belong to them</li>
<li>An HTML based text describing the file's content</li>
<li>Column related description of how to interpret a column's content, e.g. lookup URLs</li>
<li>Cell related detail data like formatted text and images</li>
<li>Keys and links for on-the-fly retrieval of cell related detail data from external sources</li>
<li>Hidden columns with molecule data such as descriptors and atom coordinates</li>
<li>Macros that allow to completely automate <span class="keyword">DataWarrior</span> (from version 4.0)</li>
<p>To open a native <span class="keyword">DataWarrior</span> file, choose <span class="menu">Open</span>
from the <span class="menu">File</span> menu or just double-click an icon representing a <b>.dwar</b> file.</p>

<br>
<h3><i><a name="TemplateFiles"></a><img class="icon" src="help/img/icons/dwat.jpeg" width="64" height="64" align="left" hspace="16">
DataWarrior Template Files</i></h3>
<p>
A <span class="keyword">DataWarrior Template</span> file contains the complete configuration of views and filters,
as they have been, when  the <span class="keyword">Template</span> file was saved. If you want to store the current
state of views and filters of an open <span class="keyword">DataWarrior</span> window in order to possibly restore
it later with the same or another dataset, you may save a <span class="keyword">Template</span> file. To re-apply a
formerly stored template to an open <span class="keyword">DataWarrior</span> window, choose
<span class="menu">Open Special -> Template...</span> from the <span class="menu">File</span> menu.
You may then select either a <b>.dwat</b> or a <b>.dwar</b> file. In both cases the template will be read from
the file and all views and filters will be replaced by new ones as defined in the file. 
</p>
<br>

<h3><i><a name="MacroFiles"></a><img class="icon" src="help/img/icons/dwam.jpeg" width="64" height="64" align="left" hspace="16">
DataWarrior Macro Files</i></h3>
<p>
<span class="keyword">DataWarrior</span> version 4.0 and above support recording, editing and replaying entire workflows.
These may be stored as part of a native <span class="keyword">DataWarrior</span> file or can be exported into a dedicated
macro file. Similar to templates you may run a macro by opening a dedicated macro file with
<span class="menu">Open Special -> Macro...</span> from the <span class="menu">File</span> menu.
</p>
<br>

<h3><i><a name="SOMFiles"></a><img class="icon" src="help/img/icons/dwas.jpeg" width="64" height="64" align="left" hspace="16">
DataWarrior SOM Files</i></h3>
<p>
By creating a self organized map (SOM) <span class="keyword">DataWarrior</span> can position chemical molecules
or other objects on a two dimensional area in a way, that any object's closest neighbours in the plane are
those objects that are the most similar ones in the dataset.
A calculated SOM is actually a 2-dimensional grid of reference vectors of which everyone resembles
one or more molecules/objects of the dataset. Once these reference vectors are calculated, the objects
are one by one assigned to that reference vector, which is the most similar to the object.
If one intends to map a second set of objects from an external file to a previously calculated SOM,
then these vectors must have be available. For that reason they can be saved as SOM file,
which can later be used to map external objects, which is effectively creating compatible 2-dimensional
object coordinates.
</p>
<br>

<h3><a name="QueryFiles"></a><img class="icon" src="help/img/icons/dwaq.jpeg" width="64" height="64" align="left" hspace="16">
DataWarrior Query Files</h3>
<p>
A <b>.dwaq</b> file or <span class="keyword">Query File</span> does not contain any data.
It rather contains a database query that is performed when the file is opened. Moreover,
it may contain the template information needed to construct certain views and filter settings
after the query result data has been retrieved. Query files are used if data in a database is
frequently changing or to confidentially communicate new results, e.g. via e-mail.
To open a <b>.dwaq</b> file, select <span class="keyword">Open Special -> Query...</span> from the
<span class="menu">File</span> menu, or double-click the icon representing the file.
</p>
<br>

<h3><a name="SDFiles"></a><img class="icon" src="help/img/icons/sdf.jpeg" width="64" height="64" align="left" hspace="16">SD-Files</h3>
<p>
SD-Files are the de-facto industry standard for exchanging chemical structures and associated
alpha-numerical information. It has been developed and published by Molecular Design Ltd. (MDL).
The version most widely used is version 2, which has limited support for stereo chemistry:
A so-called <i>chiral</i> flag defines for the entire molecule, whether it is a racemate of a mixture of enantiomers.
With version 2 SD-files it is not possible to define epimers, mixtures of diastereomers, etc.
In order to tackle the deficiencies, MDL introduced an updated concept <Enhanced Stereo Recognition>
along with an updated file format: Version 3. <span class="keyword">DataWarrior</span> consistently uses this new concept,
which allows to define for any stereo center within a molecule, whether it is absolute or
whether it belongs to a group of stereo centers with a specific relative stereo configuration.</p>
<p>From the <span class="menu">File</span> menu, select <span class="menu">Open...</span>
and use the dialog window to select the SD-file(s) (the file extension is .sdf) to import.
<span class="keyword">DataWarrior</span> reads the entire content of the SD-File, displays rows in the Table View, creates
default 2D- and 3D-Views, a <span class="keyword">Structure View</span> and generates a structure
index (FragFp descriptor), which is needed internally for some structure related tasks.
While the indexing process is underway and its progress bar is visible
in the status area, these functions e.g. sub-structure search are not yet available.</p>
<br>

<h3><a name="TextFiles"></a>Text Files</h3>
<p>TAB delimited and comma separated text files ('.txt' and '.csv') are among the
most portable file formats because they can be created by many programs.
In these text files each line represents a row and all fields within the
row are separated by TABs or commas.
In case a text file contains chemical structures in SMILES format, then
this column's header must be <span class="keyword">Smiles</span>, if you wish
<span class="keyword">DataWarrior</span> to interpret the SMILES and to automatically generate the appropriate
structures.
From the <span class="menu">File</span> menu, select <span class="menu">Open Special</span>
and choose <span class="menu">Textfile...</span></p><br>

<h3><a name="ExampleFiles"></a>Example Data Files</h3>
<p>In the standard <span class="keyword">DataWarrior</span> installation, the <span class="menu">File</span>
menu contains two submenus with direct access to some example files.
The option <span class="menu">Open Reference File</span> covers various files
with chemical structures and related data, e.g. known drugs, pKa values, bioactive compounds,
and other datasets of interest. <span class="menu">Open Example File</span>
provides examples that illustrate non-chemistry related aspects of <span class="keyword">DataWarrior</span>.
Depending on the installation, further submenus may provide quick access to files in user
defined directories.</p><br>

<h3><a name="PasteData"></a>Paste Data from the Clipboard</h3>

<p>If you copy tabular data from any text editor or spreadsheet
application, you may paste it directly into <span class="keyword">DataWarrior</span>. This will open
the data as if it were loaded from a text file. By analyzing the data
<span class="keyword">DataWarrior</span> will try to evaluate whether a header row is present
and if it believes that there is none it will generate default column names.</p>

<p>In the following example some data was selected within a spreadsheet application
and then copied to the clipboard with Ctrl-C.</p>

<p align="center"><img src="help/img/import/spreadsheet.jpeg"></p>

<p>After switching to <span class="keyword">DataWarrior</span> and after choosing <span class="menu">Paste</span>
(Ctrl-V) <span class="keyword">DataWarrior</span> responds by displaying the clipboard's content in a new window.
It has recognized the column named "Smiles" and created an additional column with chemical structures
from the SMILES strings. It also created two graphical default views and, since the data now contains
chemical structures, it also created a dedicated structure view.</p>

<p align="center"><img src="help/img/import/pasted.jpeg"></p>
<br>

<h3><a name="ImportData"></a>Importing Data From Databases</h3>

<p>Depending on your particular version <span class="keyword">DataWarrior</span> may be able to directly retrieve data from
databases. At Actelion the following options exist:</p>

<li>Chemical and biological data from the <span class="keyword">Osiris Database</span></li>
<li>Compounds and data from Actelion's <span class="keyword">Chemical Inventory</span></li>
<li>Protein Crystallization Data</li>
<li>Micro Array Data</li>
<li>Gene Expression Data</li>
<li>Compounds from the <span class="keyword">Screening Compounds Database</span></li>

<br>
<p align="center">Continue with <a href="help/views.html">Main Views</a>...</p>
</body>
</html>
